Statistics I
Lisbon Accounting and Business School – Polytechnic University of Lisbon
These slides are a free translation and adaptation from the slide deck for Estatística I by Prof. Sandra Custódio and Prof. Teresa Ferreira from the Lisbon Accounting and Business School - Polytechnical University of Lisbon.
A random variable is a function that will allow us to quantify (transform into a number) each outcome.
Random Variable
A random variable (r.v.) \(X\) is a function \(f:\Omega\rightarrow \Omega_X\subset \mathbb{R}\). \(\Omega_X\) is known as the support of the r.v. \(X\).
\[\omega\in\Omega \overset{X}{\rightarrow} X(\omega)\in\Omega_X\subset\mathbb{R}\]
\(X(\omega)\) is the image under \(X\) of the outcome \(\omega\)
Summarizing, a r.v. is a function that associates a real number to each outcome from \(\Omega\).
\(X\) is a discrete r.v. when:
In this case, \(\Omega_X=\{x_1, x_2, ... , x_n\}\) with \(n\in\mathbb{N}\) if \(\Omega_X\) is finite, and \(\Omega_X=\{x_1,x_2,...,x_n,...\}\) if it is countable infinite.
Let \(X\) be a discrete r.v. The pdf of \(X\) is a function \(f_X:\mathbb{R}\rightarrow\mathbb{R}\) such that:
\[f_X(x)=\left\{\begin{array}{cc}P(X=x) & ,\text{ if } x\in\Omega_X\\ 0 & ,\text{ if } x\in\mathbb{R}\setminus\Omega_X\end{array}\right.\]
Naturally, by construction the pdf satisfies the following properties:
\(f_X(x)\geq 0 \quad \forall x\in\mathbb{R}\) |
\(\sum_{x_i\in\Omega_X}P(X=x_i)=1\) |
The pdf gives the probability in a single point. The total probability is distributed among single points, \(x_i\). A reasonable representation of a pdf of a discrete r.v. could be:
\(x\) | \(x_1\) | \(x_2\) | … | \(x_n\) | … |
---|---|---|---|---|---|
\(f(x)\) | \(p_1\) | \(p_2\) | … | \(p_n\) | … |
Where \(p_i=P(X=x_i)\)
Consider the discrete r.v. \(X\) with the following pdf:
\(x\) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
\(f(x)\) | \(0.05\) | \(a\) | \(0.35\) | \(0.25\) | \(0.05\) |
We could define:
Our table is now:
\(x\) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
\(f(x)\) | \(0.05\) | \(0.3\) | \(0.35\) | \(0.25\) | \(0.05\) |
What is \(P(X=2|X\leq 3)\)?
\[P(X=2|X\leq 3)=\frac{P(X=2 \cap X\leq 3)}{P(X\leq 3)}= \frac{P(X=2)}{P(X\leq 3)}\]
\[= \frac{f(2)}{f(0)+...+f(3)}=\frac{0.35}{0.95}=0.368\]
\(X\) is a continuous r.v. if:
Let \(X\) a continuous r.v.
There is a function \(f_X:\mathbb{R}\rightarrow\mathbb{R}\), the pdf of \(X\) such that:
Technically, from Measure Theory, we need an absolutely continuous r.v. to ensure the existence of a pdf. These issues are beyond the scope of this course. Just know that when we say continuous r.v. we mean absolutely continuous r.v.
Note that this pdf allows to compute the probability of events \(x\in(a,b]\):
\[P(a<X\leq b)=\int_a^b f_X(x)dx\]
Observe that if you would do \(X=a\) you would get the integral from \(a\) to \(a\), which makes \(dx=0\) and therefore the integral (and the probability) becomes 0.
Let \(X\) be a continuous r.v. with the following pdf:
\[ f(x)=\left\{\begin{array}{cc} \theta x^2 & , 0\leq x< 1\\ 0 & ,\mathbb{R}\setminus [0,1) \end{array}\right. \]
Support for \(X\): \(\Omega_X=[0,1)\)
\[\int_{-\infty}^{\infty}f(x)dx=1\Leftrightarrow\int_0^1\theta x^2dx=\left[\theta\frac{x^3}{3}\right]_{0}^1\]
\[\theta\frac{1}{3}-\theta\frac{0}{3}=1\Leftrightarrow \theta=3\]
Let \(X\) be a r.v. The distribution function \(F_X:\mathbb{R}\rightarrow[0,1]\), defined as:
\[F_X(x)=P(X\leq x)\]
\(F_x\) is unique.
With a discrete r.v.
\(x\) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
\(f(x)\) | \(0.05\) | \(0.3\) | \(0.35\) | \(0.25\) | \(0.05\) |
\[F(x)=P(X\leq x)=\left\{ \begin{array}{cc} 0 & x<0 \\ 0.05 & 0\leq x < 1 \\ 0.05 + 0.3 = 0.35 & 1 \leq x < 2 \\ 0.35 + 0.35 = .7 & 2 \leq x < 3 \\ 0.7 + 0.25 = .95 & 3 \leq x < 4 \\ 1 & x\geq 4 \end{array} \right.\]
Let’s revisit our previous example:
\[P(X=2|X\leq 3)= \frac{P(X=2)}{P(X=3)}=\] \[\frac{F(2)-F(2^-)}{F(3)}= \frac{0.7-0.35}{0.95}=0.368 \]
With a continuous r.v.:
\[F_X(x)=P(X\leq x)=\int_{-\infty}^{x} f_X(x)dx\]
The distribution function, \(F_X\) allows to compute the probability of \(\{X\in(a,b]\}\)
\[P(a<X\leq b)=\int_a^b f_X(x)dx=F_X(b)-F_X(a)\]
Consider the continuous r.v. defined previously, with the pdf:
\[ f(x)=\left\{ \begin{array}{cc} 3x^2 & , 0\leq x< 1\\ 0 & , \mathbb{R}\setminus [0,1) \end{array} \right. \]
Support for \(X\): \(\Omega_X=[0,1)\)
Distribution function (cdf):
\[ F(x)=P(X\leq x) = \int_{-\infty}^x f(t)dt = \left\{ \begin{array}{cc} 0 & , x<0\\ x^2 & ,0\leq x<1 \\ 1& ,x\geq 1 \end{array} \right. \]
Nonetheless the r.v. is discrete or continuous, \(F_X\) has the following properties:
\(F_X\) for \(X\) r.v. discrete
\(F_X\) for \(X\) r.v. continuous
For a discrete r.v.
\[P(X=x)=F_X(x)-F_X(x^-)\] \[+\downarrow \uparrow -\] \[F_X(x)=\sum_{x_i\leq x}P(X=x_i)\]
For a continuous r.v.
\[ f_X(x)=\left\{ \begin{array}{cc} F_X'(x) & ,x\in\mathbb{R} \text{ if }F_X'\text{ exists} \\ 0 & \text{, otherwise} \end{array} \right. \]
\[ Derivative \downarrow \uparrow Primitive\]
\[ F_X(x)=\int_{-\infty}^x f_X(t)dt\]
The pdf of \(X\), a continuous r.v. is not unique.
We could describe the range of \(X\), a r.v., as a population, in the statistical sense, because it describes all the possible values it can take.
We can use numerical values to do so, which can represent dispersion or centrality of the data.
The expected value or mean, is a location parameter for our r.v.
Definition
The expected value or mean of a random variable \(X\) is:
Not all random variables have an expected value, it might be infinite.
Let \(X,Y\) rvs, and \(a,b\in\mathbb{R}\) scalars. Some properties of the mean
Let \(X\) be a discrete r.v. as in the previous example:
\(x\) | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
\(f(x)\) | \(0.05\) | \(0.3\) | \(0.35\) | \(0.25\) | \(0.05\) |
Let \(g(X)=2(X-2)^2+3(X-1)-5\), find \(E[g(X)]\).
\[g(X)=2(X-2)^2+3(X-1)-5\] \[=2(X^2-2X+1)+3X-3-5\] \[=2X^2-4X+2+3X-8\] \[=2X^2-X-6\]
\[E[Y]=E[2X^2-X-6]\]
\[=2E[X^2]-E[X]-6\]
We only need to find \(E[X]\) and \(E[X^2]\) to obtain \(E[g(X)]\).
\[E[X]=\sum xP(X=x)\] \[ = 0\times .05 + 1 \times .3 + 2 \times .35 + 3 \times .25 + 4\times .05 = 1.95\]
\[E[X^2]=\sum x^2 P(X=x)\]
\[ = 0\times .05 + 1 \times .3 + 4 \times .35 + 9 \times .25 + 16\times .05 = 4.75\]
\[E[g(X)]=2\times 4.75 - 1.95 - 6 = 1.55\]
Recall our example for continuous r.v.s. \(X\): \[ f_X(x)=\left\{ \begin{array}{cc} 3x^2 & ,0\leq x < 1 \\ 0 & , \mathbb{R}\setminus[0,1) \end{array} \right. \]
Find \(E[g(X)]\) when \(g(X)=2(X-1)^2+3(X-1)-5\) We know already \(g(X)=2X^2-X-6\). Let’s focus on \(E[x]\) and \(E[X^2]\).
\[E[X]=\int_{-\infty}^{\infty} xf_X(x)dx = \int_0^1 x\times 3x^2 dx\] \[= \int_0^1 3x^3dx=\left[3\frac{x^4}{4}\right]_{0}^1=\frac{3}{4}=0.75\]
\[E[X^2]=\int_{-\infty}^{\infty} x^2f_X(x)dx = \int_0^1 x^2\times 3x^2 dx\] \[= \int_0^1 3x^4dx=\left[3\frac{x^5}{5}\right]_{0}^1=\frac{3}{5}=0.6\]
Finally,
\(E[g(x)]=2\times 0.6 - 0.75 - 6 = -5.55\)
The p-quantile, \(x_p\), of a r.v. \(X\) is a location parameter, with fixed value.
\(x_p\) is the value for \(x\in\Omega_X\) such that:
O what is the same, \(x\in\Omega_X\) such that \(F_X(x^-)\leq p \leq F_X(x)\)
\(x_p\) is an \(x\in\Omega_X\) such that \(F_X(x)=p\)
Let’s apply this for the examples we just used for the expected value.
Find the median (0.5-quantile) for \(X\)
\[ F(x)=P(X\leq x)=\left\{ \begin{array}{cc} 0 &, x\leq 0\\ 0.05 &, 0\leq x <1 \\ 0.35 &, 1\leq x < 2 \\ 0.7 &, 2 \leq x < 3 \\ 0.95 &, 3 \leq x < 4 \\ 1 &, x\geq 4 \end{array} \right. \]
For example \(F(2^-)=0.35\leq 0.5 \leq 0.7=F(2)\) and therefore \(X_{0.5}=Me = 2\). Given that \(E[X]=1.95<Me(X)=2\) the distribution is slightly negatively (or left) skewed.
\[F_X(x)=P(X\leq x)=\left\{ \begin{array}{cc} 0 & x < 0 \\ x^3 & 0\leq x <1 \\ 1 & x\geq 1 \end{array} \right. \]
Let’s find \(x\) such that \(F(x)=0.5\)
\(F(x)=0.5\Leftrightarrow x^3=0.5\Leftrightarrow x=\sqrt[3]{0.5}\approx0.7937\)
And therefore, \(x_{0.5}=Me=.7937\)
Let \(X\) be a r.v. The variance of \(X\), if it exists, is defined as:
\[V[X]=E\left[\left(X-E[X]\right)^2\right]\]
It can be show, very easily, with some algebraic manipulation that \(V[x]=E\left[X^2\right]-\left(E[X]\right)^2\)
Remember that \(E[X]\equiv\mu_X\)
Usually we write \(V[X]\) as \(\sigma^2_X\).
Some properties for the variance:
If \(\sigma^2_X\) is the variance of \(X\), then the standard deviation is known as: \[\sigma_X=\sqrt{V[x]}\]
One characteristic of the standard deviation is that its units are the same as those of the random variable.
While the variance and standard deviation allow us to measure the dispersion of the data, we might want to have it relative to the mean (a \(\sigma_X=1\) can be a lot for \(X\) taking relatively low values, but negligible if we are talking in millions!)
For that we use the coefficient of variation:
\[C.V._X =\frac{\sigma_X}{\mu_X}\times 100\]
Some properties of the \(CV_X\)
Let’s compute \(\sigma^2\), \(\sigma\), and \(CV\) for our previous examples:
Values below \(50\%\) for \(CV\) allow us to see \(\mu\) as representative for the data. The lower, the closer the data to \(\mu\) and therefore the more representative it is.
For the continuous r.v. case:
When running an experiment, it could be interesting to study the relationship between two numeric features associated to each of the outcomes.
Random pair
A random pair \((X,Y)\) is a functio \(f_{X,Y}:\Omega\rightarrow \left(\Omega_X,\Omega_Y\right)\subset\mathbb{R}^2\). \(\left(\Omega_X, \Omega_Y\right)\) is known as the support of the random pair \((X,Y)\).
\[\omega\in\Omega \overset{(X,Y)}{\rightarrow}\left(X(\omega),Y(\omega)\right)\in(\Omega_X,\Omega_Y)\subset\mathbb{R}^2\]
\(X(\omega)\) is the image, under \(X\) of outcome \(\omega\), and \(Y(\omega)\) the image under \(Y\) of the same outcome.
A random pair \((X,Y)\) is discrete when:
Let \((X,Y)\) a discrete random pair, the joint density function \(f_{X,Y}(x,y)\) is a function \(f_{X,Y}:\mathbb{R}^2\rightarrow\mathbb{R}\) defined as:
\[ f_{X,Y}(x,y)=\left\{ \begin{array}{cl} P(X=x,Y=y) & , (x,y)\in(\Omega_X,\Omega_Y)\\ 0 & , (x,y)\in\mathbb{R}^2\setminus(\Omega_X,\Omega_Y) \end{array} \right. \]
\(f_{X,Y}\) satisfies the following properties:
A possible notation for \(P(X=x_i,Y=y_j)\) is \(p_{i,j}\)
\(y_1\) | \(y_2\) | \(\dots\) | \(y_j\) | \(\dots\) | ||
---|---|---|---|---|---|---|
\(x_1\) | \(p_{11}\) | \(p_{12}\) | \(\dots\) | \(p_{1j}\) | \(\dots\) | \(\sum_{j=1}^\infty p_{1j}\) |
\(x_2\) | \(p_{21}\) | \(p_{22}\) | \(\dots\) | \(p_{2j}\) | \(\dots\) | \(\sum_{j=1}^\infty p_{2j}\) |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) |
\(x_i\) | \(p_{i1}\) | \(p_{i2}\) | \(\dots\) | \(p_{ij}\) | \(\dots\) | \(\sum_{j=1}^\infty p_{ij}\) |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) |
\(\sum_{i=1}^\infty p_{i1}\) | \(\sum_{i=1}^\infty p_{i2}\) | \(\dots\) | \(\sum_{i=1}^\infty p_{ij}\) | \(\dots\) | 1 |
Given a random pair \((X,Y)\), the marginal probability function of \(X\) and \(Y\) is respectively:
For \(i=1,2,...\) and \(j=1,2,...\). Note that these functions have one dimension only.
At SuperStore 🏪, three trained employees are qualified to operate the checkout counters, restock products on the shelves, and perform some administrative tasks. SuperStore has three checkout counters, and at least one of them must always be operating.
At any given day and moment when SuperStore is open to customers, consider the following random variables:
The r.v. \(X\) has \(\Omega_X=\{1,2,3\}\) and the following pdf:
\(x\) | 1 | 2 | 3 |
---|---|---|---|
\(f_X(x)\) | 0.17 | 0.8 | 0.03 |
Consider the following table for the joint probability of \((X,Y)\)
\(X\setminus Y\) | 0 | 1 | 2 | |
---|---|---|---|---|
1 | \(a\) | \(2b\) | \(b\) | |
2 | 0.1 | \(c\) | 0 | |
3 | 0.03 | 0 | 0 | |
1 |
If \(P(X=1, Y=0)=0.02\), find \(b\) and \(c\)
\(X\setminus Y\) | 0 | 1 | 2 | \(\color{red}{f_X(x)}\) |
---|---|---|---|---|
1 | \(\color{red}{0.02}\) | \(2b\) | \(b\) | \(\color{red}{0.17}\) |
2 | 0.1 | \(c\) | 0 | \(\color{red}{0.8}\) |
3 | 0.03 | 0 | 0 | \(\color{red}{0.03}\) |
\(\color{red}{f_Y(y)}\) | \(\color{red}{0.15}\) | \(\color{red}{2b+c}\) | \(\color{red}{b}\) | 1 |
\(P(X=2|Y\geq 1)\) is approximately…?
Let \((X,Y)\) a discrete random pair, \(X,Y\) are independent if, and only if: \[P(X=x, Y=y)=P(X=x)P(Y=y)\quad \forall(x,y)\in\mathbb{R}^2\]
The joint cdf is the same as the product of each marginal pdf
… (continued exercise) Are \(X\) and \(Y\) independent?
Definition
Let the discrete random pair \((X,Y)\) have a joint \(cdf\) \(P(X=x,Y=y)\) and a function \(g:\mathbb{R}^2\rightarrow\mathbb{R}\). The expected value or mean of \(g(X,Y)\) is:
\[E[g(X,Y)]=\sum_{i=1}^\infty\sum_{j=1}^\infty g(x_i,y_j)P(X=x_i,Y=y_j)\]
If \(g(x,y)=xy\), then \(E[g(x,y)]=E[XY]\) and that equals \[\sum_{i=1}^\infty\sum_{j=1}^\infty x_iy_jP(x=x_i,Y=y_j)\]
Definition
Let the discrete random pair \((X,Y)\) have a joint \(cdf\) \(P(X=x,Y=y)\), and \(\mu_X=E[X]\) and \(\mu_Y=E[Y]\). The covariance between \(X\) and \(Y\) is:
\[cov(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]\]
Given that \(E[(X-\mu_X)(Y-\mu_Y)]\) exists.
Note that this is equivalent to \(cov(X,y)=E[XY]-E[X]E[Y]\)
The covariance tries to capture how the two r.v. move together. If it is positive, it means that both tend to go in the same direction more often than not (both above or below their means at the same time). Being negative means that more often than not when one is above its mean, the other is below.
If \(X\) and \(Y\) are independent r.v. then \(cov(X,Y)=0\). Note that the opposite is not necessarily true, i.e. \(cov(X,Y)=0\) does not imply that \(X\) and \(Y\) are independent.
Another important identity with the covariance is the following:
\[V[X\pm Y] = V(X)+V[Y]\pm 2 cov(X,Y)\]
Knowing that \(E[Y]=0.9\), \(cov(X,Y)\) is equal to? …
From the first table: \(E[X]=0.17\times 1 + 0.8 \times 2 + 0.03 \times 3 = 1.86\)
\[E[XY]=\sum_{x}\sum_{y}xyP(X=x,Y=y)=\] \[= 1 \times 0 \times 0.02 + 1\times 1 \times 0.1 + 1\times 2 \times 0.05 +\] \[+ 3\times 0 \times 0.1 + 2 \times 1 \times 0.7 + 2\times 2 \times 0 + \] \[+ 3 \times 0 \times 0.03 + 3 \times 1 \times 0 + 3 \times 2 \times 0 = 1.6\]
\(cov(X,Y)=E[XY]-E[X]E[Y]=1.6-1.86\times 0.9=-0.074\)
A caveat of the covariance is that its units depends directly on the units of \(X\) and \(Y\). The correlation coefficient allow us to express this relationship, between \(X\) and \(Y\) without being affected by the units in which these r.v. are measured.
\[\rho_{XY} = \frac{cov(X,Y)}{\sqrt{V[X]V[Y]}}=\frac{cov(X,Y)}{\sigma_X\sigma_Y}\]
Clearly \(\rho\in[-1,1]\). Note also that \(|\rho|=1\) if and only if \(P(Y=a+bX)=1\) with \(a,b\in\mathbb{R}\). If \(X\) and \(Y\) are independent r.v. then \(\rho=0\).
Correlation coefficient | Correlation |
---|---|
\(|\rho| = 1\) | Perfect |
\(0.8 \geq |\rho| < 1\) | Strong |
\(0.5 \geq |\rho| < 0.8\) | Moderate |
\(0.1 \geq |\rho| < 0.5\) | Weak |
\(0 < |\rho| < 0.1\) | Very weak |
\(\rho= 0\) | None |
Write positive or negative in front of correlation if \(\rho>0\) or \(\rho<0\) respectively.
From the marginal probability function we obtain \(V[X]\) and \(V[Y]\): \[V[x]=0.1804\text{ and }V[Y]=0.19\]
Therefore, \[\rho = \frac{cov(X,Y)}{\sigma_X\sigma_Y}=\frac{-0.074}{\sqrt{0.1804}\sqrt{0.19}}=-0.3997\]
We observe a weak negative linear correlation between \(X\) and \(Y\).
Statistics I